Don't track the size of each allocated block any more #19767
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Don't track the size of each allocated block any more
This saves us 8 bytes per block on 64 bit builds, we no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the
next
pointers did not use release semantics and the reading of them inSpaceAllocated
reads with relaxed order, there's no guarantee thatsize
has actually been initialized - but worse, there is also no guarantee thatnext
has been!. Simplified:So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce.
In reality, operation 4 will produce an
stlr
on arm (forcing an order of 1, 2, 3 before 4), andblock->next
has a data dependency onai->blocks
which would force an ordering in the hardware between 5->6 and 5->7 even for regularldr
instructions.The fix would be for
SpaceAllocated
to readai->blocks
with acquire order, but with this CL that's moot. Please check my work as I'm less familiar with the the C/C++ memory model.Delete arena contains, it's private and the only user is its own test.